When humans converse with each other, they naturally amal-gamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a model that resolves references to vi-sual (screen) elements in a conversational web browsing sys-tem. The system detects eye gaze, recognizes speech, and then interprets the user’s browsing intent (e.g., click on a specific element) through a combination of spoken language understanding and eye gaze tracking. We experiment with multi-turn interactions collected in a wizard-of-Oz scenario where users are asked to perform several web-browsing tasks. We compare several gaze features and evalua...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
In this paper, we present an embodiment perspective on viewpoint by exploring the role of eye gaze i...
In the context of synthetic generation and decoding of linguistic information, not only the audible ...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
In a conversational system, determining a user’s focus of attention is crucial to the success of the...
In multimodal human machine conversation, successfully interpret-ing human attention is critical. Wh...
Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that...
Humans are inherently skilled at using subtle physiological cues from other persons, for example gaz...
Recent studies in conversation analysis, psycholinguistics and interaction technology have pointed a...
Recent years have witnessed a growing interest in multimodal features of spoken language (Müller et ...
Recent years have witnessed a growing interest in multimodal features of spoken language (Müller et ...
Abstract: We report a series of eye-tracking studies investigating different facets of how seeing a ...
Speech has been used as the foundation for many human/machine interactive systems to convey the user...
In multi-agent, multi-user environments, users as well as agents should have a means of establishing...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
In this paper, we present an embodiment perspective on viewpoint by exploring the role of eye gaze i...
In the context of synthetic generation and decoding of linguistic information, not only the audible ...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
Recent years have witnessed a growing interest in multimodal features of language use, both for theo...
In a conversational system, determining a user’s focus of attention is crucial to the success of the...
In multimodal human machine conversation, successfully interpret-ing human attention is critical. Wh...
Gaze and language are major pillars in multimodal communication. Gaze is a non-verbal mechanism that...
Humans are inherently skilled at using subtle physiological cues from other persons, for example gaz...
Recent studies in conversation analysis, psycholinguistics and interaction technology have pointed a...
Recent years have witnessed a growing interest in multimodal features of spoken language (Müller et ...
Recent years have witnessed a growing interest in multimodal features of spoken language (Müller et ...
Abstract: We report a series of eye-tracking studies investigating different facets of how seeing a ...
Speech has been used as the foundation for many human/machine interactive systems to convey the user...
In multi-agent, multi-user environments, users as well as agents should have a means of establishing...
In this paper, we present a strongly embodied take on the phenomenon of viewpoint by exploring the r...
In this paper, we present an embodiment perspective on viewpoint by exploring the role of eye gaze i...
In the context of synthetic generation and decoding of linguistic information, not only the audible ...